Pump-priming PASCAL proposal: Large Margin Algorithms and Kernel Methods for Speech Applications
نویسندگان
چکیده
Research on large margin algorithms in conjunctions with kernels methods has been both exciting and successful. While there have been quite a few preliminary successes in applying kernel methods for speech applications, most the research efforts have focused on non-temporal problems such as text classification and optical character recognition (OCR). We propose to design, analyze, and implement learning algorithms and kernels for hierarchical-temporal speech utterances. Our first and primary end-goal is to build and test thoroughly a full-blown speech phoneme classifier that will be trained on millions of examples and will achieve the best results in this domain. We also plan to apply and test the resulting algorithms and kernels to other supervised problems in spoken language such as language identification, word spotting from phonemes, and speaker verification. 1 Background and Motivation We propose an algorithmic research framework for supervised speech analysis problems that builds on recent advances in kernel methods and large margin classifiers. Specific applications include (but are not limited to) speech phoneme classification, spoken language identification, closed vocabulary word recognition/spotting, and speaker verification. Recent work on large margin methods such as support vector machines and boosting algorithms has shown to be effective in many decision tasks. While there has been some work on large margin methods for complex decision problems, most of the research focus still revolves around the less complex structures. Speech signals however exhibit a complex temporal structure in both the input space (acoustic signal) and the target space (e.g. phonetic transcription and word transcription). For instance, speech signals exhibit multi-scale temporal behavior and the set of speech phonemes is organized in an hierarchical structure. The current state-of-the-art models to handle such speech signals are based on generative models that capture some temporal dependencies such as Hidden Markov Models (HMMs). (See for instance [9] and the many references therein.) Numerous reasons have underscored HMMs as the tool of choice in speech signal processing, such as their ability to constrain the space of possible solutions to legal and probable sequences of phonemes/words, through the design of speech-specific Markovian state topologies, as well as their insensitivity to large unbalanced datasets due to their generative nature.
منابع مشابه
Large Margin Algorithms for Discriminative Continuous Speech Recognition
Automatic speech recognition has long been a considered dream. While ASR does work today, and it is commercially available, it is extremely sensitive to noise, talker variations, and environments. The current state-of-the-art automatic speech recognizers are based on generative models that capture some temporal dependencies such as hidden Markov models (HMMs). While HMMs have been immensely imp...
متن کاملUtilizing Kernel Adaptive Filters for Speech Enhancement within the ALE Framework
Performance of the linear models, widely used within the framework of adaptive line enhancement (ALE), deteriorates dramatically in the presence of non-Gaussian noises. On the other hand, adaptive implementation of nonlinear models, e.g. the Volterra filters, suffers from the severe problems of large number of parameters and slow convergence. Nonetheless, kernel methods are emerging solutions t...
متن کاملSupport Vector Machines in High Energy Physics
This lecture will introduce the Support Vector algorithms for classification and regression. They are an application of the so called kernel trick, which allows the extension of a certain class of linear algorithms to the non linear case. The kernel trick will be introduced and in the context of structural risk minimization, large margin algorithms for classification and regression will be pres...
متن کاملIntroduction Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods 1.1 the Traditional Approach to Speech Processing
One of the most natural communication tool used by humans is their voice. It is hence natural that a lot of research has been devoted to analyze and understand human uttered speech for various applications. The most obvious one is automatic speech recognition, where the goal is to transcribe a recorded speech utterance into its corresponding sequence of words. Other applications include speaker...
متن کاملLarge Margin Training of Acoustic Models for Speech Recognition
LARGE MARGIN TRAINING OF ACOUSTIC MODELS FOR SPEECH RECOGNITION Fei Sha Advisor: Prof. Lawrence K. Saul Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there ha...
متن کامل